Skip to content

feat(hub): dependents knobs, frontier_extend step, Resources panel + poll-resilience fixes#51

Merged
caviri merged 8 commits into
developfrom
feat/hub-crawler-dependents
May 18, 2026
Merged

feat(hub): dependents knobs, frontier_extend step, Resources panel + poll-resilience fixes#51
caviri merged 8 commits into
developfrom
feat/hub-crawler-dependents

Conversation

@caviri
Copy link
Copy Markdown
Member

@caviri caviri commented May 18, 2026

Three commits, each independently revertable.

Hub form: bidirectional dependency knobs + frontier_extend

  • New form fields: crawl_dependents, crawl_dependencies, min_stars, max_dependents (0 → no cap).
  • Crawler timeout scales with max_rounds (1800s/round), unblocking depth ≥ 3 runs that were hitting the prior flat 30-min ceiling.
  • New pipeline step frontier_extend: re-seeds the crawler with the dependents listed in an existing graph but never explored, then merges the result back. Cheaper than re-crawling with max_rounds++. Off by default.
  • GET /api/pipeline/frontier-preview reports frontier_size + a sample before launching.

Services: tolerant polling

  • wait_for_completion (crawler) and wait_for_extract (metadata extractor) now swallow up to 5 consecutive transient HTTP errors with a warning each, instead of failing the whole step on a single ReadTimeout / Connection reset. Real terminal states (failed, cancelled, deadline) still propagate immediately.
  • Crawler HTTP timeout bumped 30s → 60s.

/admin Resources panel

  • GET /api/admin/resources returns disk usage per mount, MemTotal + MemAvailable, /proc/loadavg, and docker system df totals (with reclaimable bytes per resource class).
  • New /admin page polls every 15 min and colours rows yellow at 75/80% / 1.0-per-core, red at 90/92% / 2.0. Stdlib + Docker SDK only; no psutil dependency added.

Test plan

  • Quest YAML round-trip through Pydantic with the new crawler fields and frontier_extend enabled/disabled.
  • frontier_extend against the live graph: 230 frontier nodes → merged → 422 dependents, frontier shrunk to 15.
  • Poll resilience: simulated 2 transient drops → recovered; 5 consecutive → propagated; real failed → still raises.
  • /admin page renders; /api/admin/resources returns disk/mem/cpu/docker on the live host.

Out of scope (filed separately)

  • open-pulse-crawler: cancelled jobs cannot expose their partial graph via /api/v1/graph. Cost us 2,577 explored repos on one run. Report drafted for the crawler team.

caviri added 8 commits March 6, 2026 05:23
feat: integrate all services
release: prepare v1.0.0 (PyPI rename + Trusted Publisher)
ci(release): add release-please for automated semver releases
release(docs): landing redesign + Docusaurus theming + nodes registry
…end step

Quest builder now emits crawl_dependents / crawl_dependencies / min_stars /
max_dependents from the form, with a max_dependents=0 sentinel mapped to
null (no cap). Crawler timeout also scales with max_rounds (1800s per
round) so deeper crawls don't hit the prior flat 30-min ceiling.

New pipeline step `frontier_extend`: reads an existing crawler-graph.json,
finds the dependents listed but never explored (the BFS frontier),
submits a fresh crawl seeded with those nodes, and merges the result
back over the canonical path. Cheaper than re-running with max_rounds++.
Disabled by default; enable via the hub form or in the quest YAML.

The hub form exposes the new step plus GET /api/pipeline/frontier-preview
which reports frontier_size and a sample of seeds before launch.
Polling get_status / get_extract_job under load was failing the whole
pipeline step on a single ReadTimeout or Connection-reset, even though
the job on the other side was still alive and would complete normally.

Both wait_for_completion / wait_for_extract now swallow up to 5
consecutive httpx (or wrapped RuntimeError, for the extractor) poll
failures with a warning each, escalating only when the budget is
exhausted. Real failure modes (status=failed, status=cancelled,
deadline exceeded) still propagate immediately.

Crawler HTTP client timeout bumped 30s -> 60s while we're here.
GET /api/admin/resources returns disk usage per mount, MemTotal +
MemAvailable, /proc/loadavg, and `docker system df` totals (images,
containers, volumes with reclaimable bytes). 30s server-side cache.

The /admin page polls every 15 min by default and colours rows
yellow at 75% (disk) / 80% (mem) / 1.0 load-per-core, red at
90% / 92% / 2.0. Stdlib + Docker SDK only; no psutil added to the
image.

Surfaces the "we are close to the wall" signal early — we crossed 95%
disk usage twice this week before noticing.
@caviri caviri merged commit e8bd52a into develop May 18, 2026
4 of 8 checks passed
@caviri caviri deleted the feat/hub-crawler-dependents branch May 18, 2026 20:20
caviri added a commit that referenced this pull request May 18, 2026
Conflict: src/open_pulse/gui/hub/main.py

Both branches added new routers / imports to the FastAPI app:

  · develop      → ``admin`` router (the /admin Resources panel from
                  PR #51, commit 915151f)
  · this branch  → ``hub`` + ``chaoss_routes`` routers + the
                  _propagate_globals helper that mirrors the shared
                  template env to instances with their own filters

Resolved by keeping all three. New combined wiring:

    app.include_router(crawler.router)
    app.include_router(admin.router)          # develop
    app.include_router(hub.router)             # ours
    app.include_router(hub.api)                # ours
    app.include_router(chaoss_routes.router)   # ours

base.html merged cleanly (the new ``/admin`` nav entry from develop
sits alongside the new ``/chaoss`` nav entry we already added).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant